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(N Abstract 

We describe a reliable Krylov-subspace method for estimating the spectral condition number 
h— 5 of a matrix A. The main difficulty in estimating the condition number is the estimation of the 

j- — smallest singular value <7 m i n of A. Our method estimates this value by solving a consistent least- 

squares minimization problem with a known minimizer using a specific Krylov-subspace method 
called LSQR. In this method, the forward error tends to concentrate in the direction of a singular 
vector corresponding to cr min . Extensive experiments show that the method is extremely reliable. 
It is often much faster than a dense SVD and it can sometimes estimate the condition number 
when running a dense SVD would be impractical due to the computational cost or the memory 
O requirements. The method uses very little memory (it inherits this property from LSQR) and it 

works equally well on square and rectangular matrices. 

> 

O 1 Introduction 
O 

Estimating the smallest singular value cr m i n of a matrix is difficult. Dense SVD algorithms can 
approximate cr m i n well and their running time is predictable, but they are also slow. Furthermore, 
dense SVD algorithms require space that is proportional to mn when the matrix is m-by-n, which 

C*") is impractical for large sparse matrices. 

Symmetrization is not an effective way to address the problem. If we work with the Gram 
matrix A* A, we cannot estimate condition numbers k(A) = cr max /cr m i n greater than 1 / ^/ ^machine j 

^ where e mac hino is the unit roundoff (machine precision) . If we work with the augmented matrix 

r A* 
A 

Cmin(A) is transformed into a pair of eigenvalues ±<r m i n in the middle of the spectrum. Such 
eigenvalues are difficult to compute accurately with Lanczos and its variants^] and it is essentially 
impossible for such algorithms to determine that there is no eigenvalue closer to zero than the one 
that has already been computed. 



For example, the ARPACK User's Guide states that a shift-invert iteration is usually required to compute eigen- 
values in the interior of the spectrum [121 Section 3.4]. Experiments with ARPACK on some of the matrices presented 
later in the paper, whose condition number our method was able to estimate, showed that ARPACK does not converge 
on them when it tries to compute the smallest-magnitude eigenvalues of the augmented matrix without inversion. 
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This paper describes a Krylov-subspace method that can estimate <7 m in and hence the spectral 
condition number c max /cr m i n accurately. Our method is reliable in the sense that it does not 
incorrectly report significant overestimates of u m i n as accurate (at least when A is not close to being 
rank deficient). Our method is also robust since it requires very little memory. 

Our method also has some flaws. The main one is that it sometimes converges very slowly, 
making is essentially impossible to compute cr m i n . Our experience shows that the method always 
converges eventually, but that convergence might be too slow to be of practical use. There is no good 
way to determine how close the method is to termination, although it tends to behave consistently 
on related matrices (e.g. from the same application area). When K^A^j is close to ^ m ^chinc' 

the 

method sometimes overestimates <7 m in by several orders of magnitude, but it still returns a small 
estimate (smaller than cr max X 10 -11 in our experience, with e mac hine ~ 1CT 16 ). 

Even with these flaws, to the best of our knowledge this method is the only practical way to 
compute (J m i n with reasonable accuracy (to within a factor of 2 or better) on many large matrices. 

The key idea of our method is to apply LSQR, a Krylov-subspace least-squares solver, to mini- 
mize || Ax — b\\ (all norms in this paper denote the 2-norm unless stated otherwise) when we already 
have an x* that satisfies Ax* = b. Thanks to having x*, we can compute the forward error, which 
allows us to exploit the tendency of LSQR to concentrate the forward error in the direction of 
a singular vector associated with <r m i n . This is often seen as a flaw in LSQR and in Conjugate 
Gradients (LSQR is mathematically equivalent to Conjugate Gradients applied to A T Ax = A T b). 
These solvers converge slowly when the coefficient matrix A is ill conditioned because it is difficult 
for them to get rid of the error in the small subspaces of A. Our method exploits this flaw, which 
acts as a sieve that captures a vector from this subspace. 

The method sometimes converges very rapidly and sometimes very slowly. This is not related 
to the size of the problem and not to how ill conditioned it is, but to the distribution of singular 
values. 

The rest of this paper is organized as follows. Section [2] surveys related work on condition 
number estimation. Section [3] describes the core of our algorithm. Section [4] explains how it works 
using mostly visualizations of numerical experiments that display the singular-vector concentration 
effect. Section [5] analyzes the unique stopping criteria that our method relies upon. The bulk of 
our experimental evidence is summarized in Section [6j 

2 Related Work 

The large singular value a max of A can be computed accurately using a bounded number of matrix- 
vector multiplications involving A and A* . This can be done using the power method, for example, 
whose analysis for this application we explain below. The Lanczos method can reduce the number of 
matrix- vector products even further [TT]. Random projection methods can also estimate <7 max (^4) |6 . 

Estimating o" m ; n is computationally more challenging, because applying the pseudo-inverse is 
usually much harder than applying A itself. In general, existing random projection methods cannot 
efficiently estimate 0" m i n (j4) unless a decomposition of A is computed, or A is low rank (or numeri- 
cally low rank). If A is low rank, random projection methods can be used to estimate ak(A), where 
k is the (numerical) rank [BJ. 

The LINPACK condition-number estimator requires a triangular factorization of A (see Higham's 
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monograph [8j Chapter 15] for details on this and related estimators). The Gotsman- Toledo [5j 
and the Bischof et al. [2] condition-number estimators, which are specialized to sparse matrices, 
also require a triangular factorization. Estimators that require a triangular factorization are less 
expensive than the SVD, but they still cannot be applied to huge matrices. 

The LAPACK condition-number estimator relies on repeated applications of the pseudo-inverses 
of A and A* [7] . One way to apply them is using a factorization, but they can also be applied using an 
iterative solver. With an effective preconditioner, repeated applications of the pseudo-inverse may 
be less expensive than the method that we propose, but without one our method is less expensive. 
Kenny et al. [9] describe a way to estimate the condition number of a square matrix using a single 
application of the inverse to one or several vectors. 

The spectral condition number measures the norm-wise sensitivity of matrix-vector products 
and linear systems to small perturbations in the inputs. There are methods that estimate more 
focused metrics, such as the sensitivity of individual components of the inputs or output [9]. Our 
method does not address this problem. 

3 The Algorithm 

This section describes our algorithm for estimating the condition number of A. A detailed 
pseudo-code description appears in Algorithm [T} 

The algorithm starts by estimating cr max (A) and a corresponding certificate vector using power 
iteration on A* A. We perform enough iterations to estimate <7 max to within 10% with probability 
at least 1 — 1CP 12 . Using a bound due to Klein and Lu [TQJ Section we find that given a 

relative error parameter e and a failure probability parameter 5, if we perform 

](m ( 2nf + m( e L))" 

iterations, the relative error in our approximation is less than e with probability at least 1 — 5. For 
the parameters e = 10 -1 and 5 = 1CP 12 , 1004 iterations suffice even for matrices with up to 10 9 
columns. For e = 1/3 and S = 10~ 12 , only 298 iterations suffice for matrices with up to 10 9 columns. 
(The accuracy of the <7 max estimate in the power method is typically much higher than predicted by 
this bound, but the additional accuracy depends on the gap between the largest and second-largest 
singular values; the bound that we use makes no assumption on the gap.) 

The main phase of the algorithm uses a slightly-enhanced LSQR iteration |13| to estimate 
<7 m i n and to produce a corresponding certificate vector. The algorithm first generates a uniformly- 
distributed random vector x* on the unit sphere by first generating a vector x with normally- 
distributed independent random components, and setting x* = x/\\x\\. The algorithm multiplies it 
by A to produce a consistent right-hand side b = Ax*. 

Now the algorithm runs LSQR on this b, using Paige and Saunders's original formulation \1'6\ 
pages 50-51]. LSQR minimizes \\Ax — b\\ iteratively using a Lanczos-type bidiagonalization pro- 
cedure. It is mathematically equivalent to solving the normal equations A* Ax = A*b using the 
Conjugate Gradients algorithm, but it behaves much better numerically. 

2 Note that the statement of Lemma 6 in [10] is incorrect; the proof shows the correct bound. Also, the discussion 
that follows the proof of the lemma repeats the error in the statement of the lemma. 
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Algorithm 1 The condition number estimation algorithm. 



1: Input: A € R mxn 

2: Parameters and defaults: c Y (8e machinc ), c 2 (1CT 3 ), c 3 (64/e machinc ), c 4 (^e machinc ) and c[ (4e machino ) 
3: 

4: Estimate (7 max = <7 ma x(A), along with a certificate u max , using power iteration. 

5- f^min — ^maxi ^min ^max 

6: Draw a random vector x € R n with independent normal entries 

7: T^cri-\c 2 )/\\x\\ 

8: x* «- x/\\x\\ 

9: b <- Ax* 

10: 13^ <_ || b || 5 u (0) ^ 6/^(0) 

11: «(°) <- a (°) <- <- «(°)/a(°) 

12: <- «(°) 

13: :r (0) «- nx i 

14: 0(0) ^ j g(0) ; ^(0) ^ a 

15: T <- oo {Value of T is set later.} 

16: for t = 1,...,T do 

17: «W A^*" 1 ) - raC- 1 ' 

18: /9<*> «- ||«W|| 

19: «W «(*>/#(*) 

20: «W 4- - jSWvW 

21: 4- ||«W|| 

22: t)W<-t)W/tt (t) 

23: p« <- ||( pC*" 1 ) )|| 

24: c« <- p^" 1 ) /pW , s« <- /pW 

25: 0(*> <-*(*)<*(*>, p<*> «- -cWttW 

26: ^ C (t)0(*-1), ^ S 0(*-1) 

27: x(V^ f - 1 ) + (^)/pW)«)( t - 1 ) 

28: u;(*) «-«(*)_ (0(*)/p(*))ii,(t-l) 

29: ii^f 4— pW {Only diagonal and superdiagonal of R are kept in memory} 

30: if t > 1 set # t ( !_ } M 4- 6K*" 1 ) 

31: <- a;* - X® 

32: if = set & min «- CT max , min «- v max and break for 

{For matrices with 1 the probability of getting d^' — is 0}. 

33: if \\AdW\\ < <7 min ||<i«|| then 

34: a min ^||^(*)||/||rfW||,i) min ^rfW 

35: end if 

36: if o- max /a min > c 4 then ci 4- ci 

37: if not converged (T = oo) and - Cl ° r " d(t) " - T or J m M /%n > c 3 ) then 

38: T «— |"1.25t| 

39: end if 

40: end for 

41: Estimate <7 min = <T m in(-R^), using inverse power iteration 

42: CT min 4- min(o- min ,(T min ) 

43: 

44: return o- max ,CT min , w max and v min , a min 
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Our algorithm adds a few steps to each LSQR iteration. At the end of each (standard) LSQR 
iteration, we have an updated approximate solution and an estimate of \\r®\\ = ||j4xW — b\\, 
denoted by This estimate of \\r®\\ is mathematically correct but the equality of <f>® and 
depends on the orthogonality of the Lanczos vectors, which lose orthogonality in floating point 
arithmetic as the algorithm progresses. Our algorithm also computes d® = x* — x® and 
We have 



Ad® 



A (x* - x®) 
b - Ax® 



.(*) 



which in exact arithmetic equals (j>® , but to improve the robustness of the algorithm we compute 
[|vlcZW|| explicitly. (In our numerical experiments we have found cj>® to be an accurate estimate, 
but we prefer to avoid any reliance on the orthogonality of the Lanczos vectors in our algorithm.) 
We also compute ||x^||. 

Next, the algorithm computes the ratio [|-A<ZW||/||dw||, which like any Rayleigh quotient is an 
upper bound on cr m i n . If this ratio is the smallest we have seen so far, the algorithm treats it as an 
estimate of a m { n and stores both the ratio and the certificate d® . When the algorithm terminates, 
it outputs the best ratio it has found and the corresponding certificate. 

We use three stopping criteria. The first stopping criterion is the one used by the standard 
LSQR algorithm [T3] for consistent systems: 





r («) 




<5"max | 


x®\ 


+ \\b\\ 



< Cl 



(3.1) 



where <7 max is our estimate of || A\\ and c\ is a parameter that is set by default to 8e mac hi n e- It has been 
observed experimentally [3] that for consistent systems, as long as c\ = fi(e ma chine) this criterion will 
be eventually met in spite of the loss of orthogonality in the biorthogonalization process; however, 
the residual norm does not seem to decrease much below the value required to satisfy ( |3.1| ) [3], so 
a much smaller c\ cannot be used. 

In many cases our second stopping criterion, which is non-standard, will stop LSQR well before 
the residual is that small. This second condition is 



d® 



< 



(3.2) 



where erf -1 is the inverse error function (we use a numerical approximation of erf _1 (c2)), and C2 is 
a parameter that is set by default to 10~ 3 . We explain this stopping criterion and how the choice 
of C2 affects the algorithm later, in Section [5} 
The third stopping criterion is 

ll^"- (3.3) 



\\d®\\ 



> C3 , 



where C3 is a parameter that is set by default to 64/e mac hine- In other words, at this threshold we 
consider the matrix to be numerically rank deficient and we do not attempt to estimate the exact 
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condition number. This criterion is used in the standard LSQR algorithm |13] as a regularizing 
criterion. 

To achieve good accuracy even for matrices that are terribly ill conditioned (condition number 
close to 1/emachine)) the stopping criteria are refined in two additional ways: 

1. If at some point we have 

\\Ad&l 

\m\ - C4 ' 

where C4 is a parameter that is set by default to i/e mac hme> we set c\ (residual-based stopping 
threshold) to Cj_, which is set by default to 4e mac hine- 

2. Even when the method detects convergence using one of its three criteria (small residual, small 
error, and numerical rank deficiency), it keeps iterating. The number of extra iterations is one 
quarter of the number performed until convergence was detected. This rule is a heuristic that 
tries to improve the accuracy of the condition number estimate. The cost of this heuristic is 
obviously limited and it can be turned off by the user. 

There is one more twist to the algorithm. The algorithm stores the matrix R^\ one of two 
bidiagonal matrices that LSQR incrementally constructs but normally discards. As in the symmetric 
Lanczos algorithm, the singular values of converge to the singular values of A. Once the 
algorithm terminates, we compute <7 m i n « a m i n (R^); if it is smaller than the best ||/||d^ || 

estimate, we output both estimates. One estimate (5- m in) is tighter, but it comes with no certificate 
vector; the other is looser, but comes with a certificate. Generating the certificate for the Lanczos 
estimate requires storing the Lanczos vectors or repeating the iterations, both of which we consider 
to be too expensive. 

Storing RW and estimating a m i n (R^) is relatively inexpensive since R^ is bidiagonal. We 
estimate it by running inverse iteration on Ry) , again performing enough iterations to get to within 
10% with very high probability. Since we use power-iteration, the error in a m ; n is one sided: it 
is always the case that <5- m ; n > cr m m(R^) (because it is generated by a Rayleigh quotient). Also, 
0"min(-R^) > a m i n (A). Therefore, a m \ n is also an upper bound on a m i n (A), not just an estimate. 



4 Rationale 

This section explains the rationale behind the method using several illustrative numerical experi- 
ments. All the experiments were done on 1000-by-400 matrices with real entries with prescribed 
singular values and random singular vectors. 

Let us begin the discussion with a matrix that has 300 singular values that are distributed 



logarithmically between 10 -3 and 10~ 2 , 10 values at 10~ 8 , and 90 at 1. Figure 4.1 shows the 
behavior of our method on one matrix generated with this spectrum. In the first 90 iterations or so 
the residual diminishes logarithmically; the norm of — x* drops a bit but then stops dropping 
much. These two effects cause our estimate to also diminish roughly logarithmically. The Lanczos 
estimate (cr m i n (R^)) drops a bit initially but stagnates from iteration 30 or so. What is happening 
up to iteration 90 or so is that the Lanczos bidiagonalization resolves the singular values in the 
10- 3 -to-10- 2 cluster, while LSQR removes much of the projection of the corresponding singular 
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Random x* and consistent b = Ax* 




Iteration 



Figure 4.1: The behavior of the method on one matrix described in the text. 



vectors from the residual and from the error. Around iteration 160 Lanczos has resolved enough of 
the spectrum in the 10~ 3 -to-10 -2 cluster and the small singular value of starts moving toward 
1CP 8 . At that point, most of the remaining error consists of singular vectors corresponding to the 
singular value 10™ 8 , which causes our estimate to be accurate (to within 9 decimal digits!). The 
norm of — x* is still large, because the error contains a significant component in the subspace 
associated with the 10 -8 singular values. At this point, LSQR starts to resolve the error in this 
subspace, the residual starts decreasing again, the norm of — x* starts decreasing, which causes 
our stopping criterion to be met. 

Figure 4.2 visualizes the behavior described in the previous paragraph. We see that up to around 
iteration 50, the error associated with singular spaces associated with singular values other than 1 
remains very large. From that point on to about iteration 150, the error in subspaces corresponding 
to values between 10 -3 and 10~ 2 is resolved, but the error associated with the singular value 10~ 8 
is still very large. This is a point where our method finds the small singular value and its certificate 
(the error). As LSQR starts to resolve the error in the 10 -8 singular subspace, the errors in the 
10~ 3 to 10~ 2 subspaces grows (perhaps due to loss of orthogonality) but they are reduced again 
later. 

The method yielded similar results when the small singular value was moved down to 10 -13 , 
with convergence after about 440 iterations and an estimate that is correct to within 5 decimal 
digits. The value cr m i n = 10~ 13 is about the lower limit for which the stopping criterion is 

useful. 

When A is rank deficient there is more than one solution to the system Ax = b. The solution x*, 
which was generated randomly, has no special property that distinguishes it from other solutions 
(like minimum norm), so no least-squares solver can recover x*. Therefore, it is unlikely 
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Iog 10 of the projection of the error on singular vectors 
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Index of singular triplet 



150 200 250 

Index of singular triplet 



400 



Figure 4.2: The projection of the forward error on the right singular vectors in the experiment 



shown in Figure 4.1 The bottom plot shows the singular values and the top image shows the 
projections. Blue represents values near e mac hinej green represents values near y / e rnac hi n e, and red 
represents values near 1. 
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Random x* and consistent b = Ax* 
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Figure 4.3: The behavior of the method on one matrix described in the text. 



will become small enough to cause our method to stop. In this case, the method stops because the 
residual eventually becomes very small (close to e mac hi ne ) or because the estimated condition number 



becomes too big (stopping condition ( |3.3[ )). In Figure 4.3 we illustrate the behavior of the algorithm 
on a rank deficient matrix; the matrix has 10 singular values that are 10 -16 (numerical zeros), 300 
distributed logarithmically between 10 -3 and 10 -2 , and 90 at 1. The algorithm stopped because the 
condition number got too big, however a few iterations later it would have stopped because 
became too small (stopping condition (3.1 )). Our estimate is not very accurate (around 1.8 x 10~ 15 ). 



This still indicates to the user that the matrix is numerically rank deficient, although the algorithm 
cannot tell the user exactly how close to e~a C hine * ne condition number is (and certainly not whether 
it is higher). Stopping criteria (3.1) and ( |3.3[ ), and the relatively- inaccurate estimates they yield, 
are used only when the matrix is close to rank deficiency (condition number of about 10 14 or larger). 

Can we estimate <r m i n by minimizing \\Ax — b\\ with a random b, which with high probability 
is inconsistent if A has fewer columns than rows or is rank deficient? On some matrices, applying 
the pseudo inverse of A to such a b produces a minimizer with a norm that is larger than the norm 
of b by about a factor of 0"~ m . However, unless there is a large gap between the smallest singular 
values and th e res t, the minimizer has a larger norm and the norms ratio fails to accurately estimate 
(7~J n . Figure 4.4 shows the relative errors in this estimate for matrices whose singular values are 
distributed linearly between cx m i n and 1. The errors are huge, sometimes by more than 3 orders of 
magnitude. This is not a particularly useful method. This method amounts to one half of inverse 
iteration on A* A, so it is not surprising that it is not accurate; performing more iterations would 
make the method very reliable, but at the cost of applying the pseudo-inverse many times. 

This estimator is clearly biased (the estimate is always larger than <7 m i n ); so is any fixed number of 
steps of inverse iteration. Kenny et al. [2] derive a unbiased estimator of this type for the Frobenius- 
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Figure 4.4: Estimating cr m i n by ||^4. + fr|| 1 for a unit-length random b with normal independent 
components. 



norm condition number. To the best of our knowledge, this is not possible in the Euclidean norm. 

Other distributions of the singular values lead to more accurate estimates in this method. But 
will this method work when the least-squares minimizer is an iterative method like LSQR? The 
following experiment suggests that the answer is no. The matrix used in the experiment has 50 
singular values distributed logarithmically between 10~ 10 and 10~ 9 , 50 more distributed logarithmi- 
cally between 10 _1 and 1, and the rest are all 1. The results, presented in Figure 4.5, indicate that 
there is no good way to decide when to terminate LSQR when used in this way to estimate cx m i n . 
We obviously cannot rely on the residual approaching e mac hme> because the problem is inconsistent. 
The original LSQR paper [T3] suggests another stopping condition, 



0",, 



-(*)! 



but our experiment shows that this ratio may fail to get close to e mac hine- In our experiment the 
best local minimum is around 10 -10 , six orders of magnitude larger than e mac hine! Moreover, at 
that local minimum, around iteration 60, the estimate \\b + r'*' || is still near 1, very far from 
cr m i n . The Lanczos estimate <7 m i n (i?^) is also very inaccurate at that time. There does not appear 
to be a good way to decide when to stop the iterations and to report the best estimate seen so far. 

On matrices with this singular value distribution, our method detects convergence after 2400- 
2500 iterations, returning a certified estimate of <7 m i n that is accurate to within 15-40% (the accuracy 
of the Lanczos estimate is better, with relative errors smaller than 10%). Figure 4.6 shows a typical 
run. The number of iterations is large, but the stopping criteria are robust. 
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Random b (inconsistent system) 
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Figure 4.5: Running LSQR with an inconsistent right-hand-side b. 
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Random x* and consistent b = Ax* 
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Figure 4.6: Running our method on the same matrix as in Figure 4.5 
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5 Analysis of the Small-Error Stopping Criterion 



Now that we understand how the method works, we can explain how to derive the small-error 



stopping criterion (3.2). Suppose that the smallest singular value of A is simple and that it is well 
separated from larger singular values. Let x* = Ya=i a % v i be the initial vector represented in the 
basis of the right singular vectors of A. As LSQR progresses towards finding x* it will tend initially 
to resolve components in the direction of the largest singular vectors. Since the v n direction is not 
present in x^ ' = we expect it to be not present during the initial iterations, i.e. oJ'iW ~ 0. This 
implies that we expect for the initial iterations t to have \v^{x* — x^)\ ~ \a n \ so — x®\\ > a n . 
Now, at some point in the iteration, the solution x® will be roughly x^ ~ J27=i a i v ii i- e - the 
error remains mostly in the direction of the small singular subspace, but the v n direction is not 
present at all. At that point, — x&)\\ ~ \a n \. LSQR will now start to resolve that error at 
least partially and the norm of the error will decrease below \a n \. If we stop the iteration when 
\\x* — x^\\ 3> \a n \, the error is unlikely to be a good estimate of a small singular vector. If we 
stop when \\x* — x^\\ < \a n \ we will likely have a good estimate of a small singular value. Ideally, 
we want to stop immediately when — x®\\ drops below \a n \. Stopping later (when the error is 
much smaller than \a n \) does not do any harm, since we report the best Rayleigh quotient seen, 
but it does not improve the estimate by much. 

We do not know a n = v^x* (which is also a random variable), but we can do a test for which 
passing it implies that Ha?* — x®\\ < \a n \ with high probability. Recall that x* = x/\\x\\ where x is a 
vector with normally-distributed independent random components. Let &w be the LSQR estimates 
that we would have found if we run LSQR on b = Ax, and let a n = v^x. Clearly, = \\x\\x® and 
a n = \\x\\a n , so ||x* — x^'\\ < \a n \ if and only if \\x — x^\\ < \a n \. Therefore, if we find a value r 
such that Pr(|d n | > r) > 1 — C2 then if \\x — x^ \\ < r then ||x — x^ \\ < \a n \ (and \\x* — x^ \\ < \a n \) 
with probability of at least 1 — C2- The condition ||x — x^'\\ < r is equivalent to — x^\\ < r/||x||. 
Since \\v n \\ = 1 we have a n ~ ^(0, 1). Therefore, for any < ci < 1 

Pr(|a„| >err 1 (c 2 )) = l-c 2 . 

This immediately leads to the stopping criterion 

CO [I / CTrl ( c 2) 



\\ar - iw < 

11*11 

The choice of c 2 in the algorithm determines our confidence that \\x* — x®\\ dropped below \a n \. 
We use a default value of 10 -3 , which implies a small probability of failure, but not a tiny one. The 
user can, of course, change the value if he needs higher confidence (in either case the error is one 
sided). Another approach is to use a much larger c 2 and to repeat the algorithm £ times (possibly 
in a single run, exploiting matrix-matrix multiplies) , say £ = 3. The probability that we succeed 
in at least one run is at least 1 — c|. For c 2 = 1CP 2 , say, setting £ = 3 or so should suffice. In our 
experience it is better to make c 2 smaller than to set £ > 1, but we did not do a formal analysis of 
this issue. 

If the small singular value is multiple (associated with a singular subspace of dimension k > 1), 
our situation is even better, because we can stop when x* — x w J2i=n-k+i a i v ii w hen \\x* — x\\ as 
Jo-n-k+i + ' ' " + a n> which is even more likely to be larger than our stopping criterion. 



12 



Random x* and consistent b = Ax* 
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Figure 6.1: Singular values are distributed linearly from 10 up to 1. 

When the small singular value is not well separated, the stopping criterion is still sound but 
the Rayleigh quotient estimate we obtain is not as accurate, because in such cases x* — x tends to 
be a linear combination of singular vectors corresponding to several singular values. These singular 
values are all small, but they are not exactly the same, thereby pulling the Rayleigh quotient up a 
bit. 



6 Additional Experiments 

6.1 Additional Illustrative Examples 



In Figure 6.1 all the singular values are distributed linearly from 10~ 8 up to 1. Convergence is 
fairly slow. The gap between a m - m = 0400 = 10~ 8 and 0399 is relatively large, around so a m - m 
is computed accurately. When many singular values are distributed logarithmically or nearly so, 
convergence is very slow and the small relative gap between (j m i n and the next-larger singular values 



causes the method to return a less accurate estimate. Figure 6.2 plots the convergence when 200 
singular values are distributed logarithmically between 10 -3 and 1 and the rest are at 1. We do 
not see a a period of stagnation during which the error is a good estimate of v m i n . The certified 
estimate is only accurate to within 31% and the Lanczos estimate to within 10% (much worse than 
when the small singular value is well separated from the rest). 

LSQR might have several periods of stagnation. This happens when the spectrum contains 
several well-separated clusters. Figure |6.3| plots the convergence when the matrix has a multiple 
singular value at 10~ 10 , a multiple singular value at 10~ 7 (both with multiplicity 10), 300 singular 
values that are distributed logarithmically between 10~ 3 and 10™ 2 , and the rest are at 1. We see 
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Random x* and consistent b = Ax* 




multiple stagnation periods of both the residual, the error, the Lanczos estimate, and our certified 
estimate. 

6.2 Experiments on Large Structured Random Matrices 

The next set of experiments was performed on sparse matrices that motivated this project. These 
matrices have exactly 3 nonzeros per column, where the location of the nonzeros is random and 
uniform and their values are +1 or —1 with equal independent probabilities. These type of matrices 
arise in trying to simulate the evolution of random 2-dimensional complexes in various stochastic 
models jT]. Such m-by-n matrices tend to be well conditioned when n < 0.9m and rank deficient 



when n > 0.95m. Figure 6.4 shows that the method converges quite quickly even on large matrices 
in both the well conditioned and the rank deficient cases. On smaller matrices of this type we were 
able to assess the accuracy of the method. For m = 1000, n = 900 yielded a relative error of 22% 
(the Lanczos estimate was off by 78%), and n = 450 yielded a relative error of 41% (the Lanczos 
estimate was off by 18%). Problems of this type of size m = 1, 000, 000 required similar number of 
iterations and were easily solved on a laptop. It is worth noting that due to the random structure of 
the non-zero pattern, it is likely that factorization based condition number estimators will be very 
slow when applied to this type of matrices. 

6.3 Experiments on Many Real- World Matrices 

We ran both a dense SVD and our method on all the matrices from Tim Davis's sparse matrix 
collection |4] for which mn 2 < 256 x 10 9 . Our method converged in 100,000 iterations or less on 
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Figure 6.4: Random matrices with 3 nonzeros per row. On the left we see the convergence on a 
100, 000-by-90, 000 matrix and on the left convergence on a 100, 000-by-95, 000 matrix. 
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777 



n time (s) iterations k (est.) 



flower_7_4 
flower_8_4 
wheel_601 
Franz 11 



lp_ken_18 
lp_pds_20 



raj at 10 



30202 
67593 
125361 
902103 
47104 
154699 
108175 



30202 
27693 
55081 

723605 
30144 

105127 
33874 



43 5219 1.2e+03 

4 231 1.6e+01 

15 537 2.2e+13 

1278 5260 1.3e+14 

1.4 59 3.3e+15 

59 1836 2.5e+14 

13 697 1.2e+14 



Table 1: Large real- world matrices whose condition number was successfully computed by our 
method. 

1024 out of the 1468 matrices in this category. 

Out of the 1468 matrices, 404 had condition number 64/e mac hme ~ 7 x 10 13 or larger. Our 
method converged on 278 out of them, delivering condition number estimates of 5 x 10 11 or larger. 
In other words, on all the matrices that were close to rank deficiency, our method detected that the 
condition number is large, but in some cases it underestimated the actual condition number. 

On matrices with condition number smaller than 64/e mac hine) our method always estimated the 
condition number to within a relative error of 24% or less. 

We ran the method again on some of the matrices on which it failed to converge in 100,000 
iterations, allowing the method to run longer. It converged in all cases. For example, on nosl, the 
method detected convergence after 169,791 iterations. The <Tmin 

estimate it returned was actually 
from iteration 90,173, meaning that at iteration 100,000 it actually converged, but the algorithm 
was not yet able to detect convergence. We note that nosl is a square matrix of dimension 237; the 
method can be slow even on small matrices. 

The running time of our method obviously varies a lot and is not easy to characterize. But on 
large matrices it is often much faster than a dense SVD. On one matrix in our test set, bips98_606 
(a square matrix of dimension 7135), our method was more than 550 times faster than a dense 
SVD, even though the dense SVD routine used all 4 cores in the Intel Core i7 machine where as our 
method used only one. The machine had 16GB of RAM and the SVD computation did not perform 
any paging activity. 

6.4 Experiments on Large Real- World Matrices 

We ran the algorithm on a few very large matrices from the same matrix collection. As on the 
smaller matrices, the method sometimes converged but sometimes it exceeded the maximal number 
of iterations (up to 1,000,000). Table [T] shows the statistics of successful runs; they indicate that 
when the singular spectrum is clustered, the method works well even on very large matrices. 

7 Summary 

We have presented an adaptation of LSQR to the estimation of the condition number of matrices. 

Our method is yet another tool in the condition-number estimation toolbox. It relies almost 
solely on matrix-vector multiplications, so it can be applied to very large sparse matrices. It does 
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not require much memory, and it is at least as fast as a single application of un-preconditioned 
LSQR to solve a least-squares problem. The method is reliable in the sense that it never returns 
an overestimate of the condition number. 

In many cases, the method is orders-of-magnitude faster than competing methods, especially if 
A is large and has no sparse triangular factorization. 

However, the performance of the method depends on the distribution of the singular values of 
A, and some distributions lead to very slow convergence. In such cases, the method still provides a 
lower bound on the condition number, but it may be loose. In such cases, methods that are based 
on an orthogonal or triangular factorizations or on preconditioned iterative solvers may be faster. 

Our method is primarily based on one key insight: that the forward error in LSQR tends to 
converge to an approximate singular vector associated with a m \ n . This property of LSQR and related 
Krylov-subspace solvers is normally seen as a deficiency (because it slows down the convergence to 
the minimizer), but it turns out to be beneficial for condition- number estimation. 
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